System and method for speech recognition using a reduced user dictionary, and computer readable storage medium therefor

ABSTRACT

A speech recognition system for rapidly performing recognition processing while maintaining quality of speech recognition in a speech recognition device, are provided. A speech recognition system includes a speech input device which inputs speech and displays a recognition result, and a speech recognition device which receives the speech from the speech input device, performs recognition processing, and sends back the speech to the speech input device. The speech input device includes a user dictionary section which stores words used for recognizing the input speech, and a reduced user dictionary creation unit which extracts words corresponding to the input speech from the user dictionary and creates a reduced user dictionary. The speech recognition device has a speech recognition unit which inputs the input speech and the reduced user dictionary from the speech input/output device and recognizes the input speech based on the reduced user dictionary and a system dictionary provided beforehand.

TECHNICAL FIELD

The present invention relates to a speech recognition system of aserver-client typo, a speech recognition method, and a speechrecognition processing program, in which speech is input in a clientterminal device and speech recognition processing is performed in aserver connected over a network.

BACKGROUND ART

In a speech recognition system of the server-client type, how to arrangea dictionary for speech recognition is an important aspect in design.Considering that an engine performing speech recognition is provided toa server, it is reasonable that a dictionary for speech recognition isprovided to the server which is easily accessible from the engine. Thisis because in a network line connecting a client terminal device(hereinafter referred to as a “client”) and a server, data transferringspeed is generally lower and costs required for communications aregenerally higher compared with a data bus which is a data transmissionpath inside the server.

On the other hand, there is a case where it is desirable to changevocabulary for speech recognition by each client, such as words whichare uniquely used by a client. In such a case, it is convenient formanagement to store a dictionary for speech recognition including wordsuniquely used by a client on the client side. As such, in a speechrecognition system of the server-client type, speech recognitionprocessing is generally proceeded using both a dictionary for speechrecognition provided to the server and a dictionary for speechrecognition provided to the client. An example of a system forperforming speech recognition processing using both a dictionary forspeech recognition provided to a server and a dictionary for speechrecognition provided to a client has been proposed (see Patent Document1).

A speech recognition system shown in FIG. 8 includes a client 100 havinga speech recognition engine 104 and a recognition dictionary 103, and aserver 110 having a speech recognition engine 114 and a recognitiondictionary 113. This speech recognition system generally operates asfollows. When a speech is input from a speech input section 102, theclient 100 refers to the recognition dictionary 103 controlled by adictionary control section 106 and performs speech recognitionprocessing by the speech recognition engine 104. When the speechrecognition processing is performed successfully and a speechrecognition result is obtained, the speech recognition result is outputvia a result integration section 107.

In contrast, when the speech recognition processing is performedunsuccessfully and a speech recognition result is rejected, the client100 transmits the input speech data to the server 110 by a speechtransmission section 105. The server 100 receives the speech data by aspeech reception section 112, refers to the recognition dictionary 113controlled by a dictionary control section 115, and performs speechrecognition processing by the speech recognition engine 114. Theobtained speech recognition result is transmitted to the client 110 by aresult transmission section 116, and is output via the recognitionintegration section 107.

In summary, if a speech recognition result is obtained by the clientitself, the result is used as an output of the speech recognitionsystem, and if a speech recognition result cannot be obtained, theserver performed speech recognition processing and a speech recognitionresult thereof is used as an output of the speech recognition system.

Another example of a system for performing speech recognition processingusing a dictionary for speech recognition provided to a server and adictionary for speech recognition provided to a client has also beenproposed (see Patent Document 2). A speech recognition system shown inFIG. 9 includes a client 200 having a storage section 204 storing a userdictionary 240A, speech recognition data 204B, and dictionary managementinformation 204C, and a server 210 having a recognition dictionary 215and a speech recognition section 214. The client 200 and the server 210are adapted to perform communications with each other via acommunication section 202 of the client 200 side and a communicationsection 211 of the server side.

This speech recognition system generally operates as follows. Prior tospeech recognition processing, the client 200 transmits the userdictionary 204A to the server 210 by the communication section 202.Then, the client 200 transmits the speech data input from a speech inputsection 201 to the server 210 by the communication section 202. Theserver 210 performs speech recognition processing by the speechrecognition section 214 using the user dictionary 204 received by thecommunication section 211 and the recognition dictionary 215 managed bya dictionary management section 212.

-   Patent Document 1: Japanese Patent Laid-Open Publication No.    2003-295893-   Patent Document 2: Japanese Patent No. 3581648

DISCLOSURE OF THE INVENTION Problems to be Solved by the Invention

However the speech recognition systems of the above techniques involvethe following problems.

First, in the art described in Patent Document 1, speech recognitionprocessing using the recognition dictionary on the client and therecognition dictionary on the server cannot be performed. This isbecause in the system of Patent Document 1, speech recognitionprocessing is first performed using only the recognition dictionary onthe client, and when speech recognition processing is failed, thenspeech recognition processing is performed using only the recognitiondictionary on the server. As such, in the case where a correct speechrecognition result includes a plurality of words, and part of the wordsare only included in the recognition dictionary of the client side andanother part of the words are only included in the recognitiondictionary of the server side, a correct speech recognition resultcannot be obtained in this system.

Further, in the art of Patent Document 1, speech recognition processingis first performed on the client side and success/failure of the speechrecognition processing is determined on the client side, and only whenthe processing is failed, speech recognition processing is performed onthe server side. As such, in the system of Patent Document 1, if theclient erroneously determined as successful even though it failed in thespeech recognition processing, the result is adopted as a speechrecognition result of the entire system. As such, the accuracy of thespeech recognition processing performed by the client largely affectsthe accuracy of the speech recognition processing of the entire system.

However, the resources usable in the client terminal is generallysmaller compared with that of the server, and accuracy of the speechrecognition processing on the client is generally lower than the case ofperforming processing in the server. As such, there is a disadvantagethat the accuracy of speech recognition as the system is not easilyimproved.

Further, in the art described in Patent Document 2, prior to speechrecognition processing, a recognition dictionary on the client istransmitted to the server, and the server performs speech recognitionprocessing using the transmitted recognition dictionary and therecognition dictionary of its own. In this system, as a large amount ofdata is transmitted before speech recognition processing, there is adisadvantage that a large amount of communication costs andcommunication times are needed. Note that Patent Document 2 mentions amethod in which an input form identifier is designated and managed foreach recognition vocabulary, and speech recognition object vocabulary inthe user dictionary is narrowed down using information of an input formof a current input object.

However, the case where this method of narrowing down the speechrecognition object vocabulary is adaptable is limited to only wheninformation for narrowing down the speech recognition object vocabulary(in this case, input form information) has been given before speaking.As such, there is a disadvantage that this method is not applicable to ageneral speech recognition system which cannot use such additionalinformation.

An object of the present invention is to provide a speech recognitionsystem of a server-client type, a speech recognition method, and aspeech recognition processing program, capable of rapidly processingspeech recognition while maintaining the quality of the speechrecognition without increasing the load on the system.

Means for Solving the Problems

In order to achieve the object, a speech recognition system according tothe present invention is a speech recognition system for recognizing aninput speech converted into an electric signal, including a userdictionary section which stores a user dictionary to be used for speechrecognition, a reduced user dictionary creation unit which creates areduced user dictionary by eliminating words determined as unnecessaryfor recognizing the input speech from the user dictionary, and a speechrecognition unit which adds the reduced user dictionary to a systemdictionary provided beforehand, and recognizes the input speech based onthe system dictionary and the reduced user dictionary.

A speech recognition method according to the present invention is aspeech recognition method for recognizing an input speech converted intoan electric signal, including, creating a reduced user dictionary byeliminating words determined as unnecessary for recognizing the inputspeech from a user dictionary, adding the reduced user dictionary to asystem dictionary previously provided, and recognizing the input speechbased on the system dictionary and the reduced user dictionary.

A speech recognition program according to the present invention is aspeech recognition program for recognizing an input speech convertedinto an electric signal, in which the program causes a computer of theclient terminal device to perform a function of creating a reduced userdictionary by eliminating, from a user dictionary, words determined asunnecessary for recognizing the input speech, and causes a computer ofthe server to perform a function of adding the reduced user dictionaryto a system dictionary provided beforehand and recognizing the inputspeech based on the reduced user dictionary and the system dictionary.

Effects of the Invention

As the present invention is adapted to transmit an input speech and areduced user dictionary from a speech input device when speechrecognition processing is performed in a speech recognition device, thespeech recognition device can perform speech recognition on the inputspeech based on the reduced user dictionary and the system dictionarywhile maintaining the quality of the speech recognition. Further, as thereduced user dictionary having smaller data capacity is transmitted,instead of the user dictionary, from the speech input device, the amountof data transmitted to the speech recognition device and thecommunication costs can be reduced significantly compared with the caseof transmitting the entire user dictionary, the data transmission timeand the processing time for speech recognition in the speech recognitiondevice can be reduced significantly. Accordingly, speech recognition canbe achieved rapidly while maintaining the quality of the speechrecognition without increasing the load on the system.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, exemplary embodiments of the invention will be describedbased on the accompanying drawings.

First Exemplary Embodiment

An exemplary configuration of a speech recognition system according to afirst exemplary embodiment of the invention will be described based onFIG. 1.

In FIG. 1, the speech recognition system according to the exemplaryembodiment includes a client terminal device (hereinafter referred to as“client”) 10 as a speech input device, and a server 20 as a speechrecognition device. The client 10 includes a speech input section 11which inputs speech, a user dictionary section 12 storing words used forspeech recognition, a reduced user dictionary creation section 13working as a reduced user dictionary creation unit which, regardinginput speech, eliminates words determined as unnecessary from the userdictionary section 12 and creates a reduced user dictionary, and aclient communication section 14 which transmits input speech and thereduced user dictionary to the server 20. A reference numeral 13Dindicates a reduced user dictionary section storing the reduced userdictionary created by the reduced user dictionary creation section 13.Further, a reference numeral 15 indicates a recognition result outputsection which outputs and displays speech information of a recognitionresult, having been speech-recognized in the server 20 and transmitted,on the head of a display screen.

The server 20 includes a system dictionary 21 storing words to be usedfor speech recognition, a server communication section 23 which receivesinput speech and a reduced user dictionary transmitted from the client10, and a speech recognition section 22 working as a speech recognitionunit which performs speech recognition processing for input speech usingthe system dictionary and the reduced user dictionary.

As such, in speech recognition processing performed in the server 20 ofthe exemplary embodiment, a speech recognition result which is the sameas the case of using both the system dictionary and the user dictionarycan be acquired substantially. Further, the amount of data transferredfrom the client 10 to the server 20 and communication costs can bereduced compared with the case of transmitting the entire userdictionary.

Specifically, the reduced user dictionary is configured as a dictionaryin which words having a high likelihood of being included in inputspeech are selected from the words stored in the user dictionary 12. Thereduced user dictionary creation section 13 compares the words stored inthe user dictionary 12 and input speech, calculates the likelihood ofthe words appearing in the input speech, and selects words of highlikelihoods based on the calculation result to thereby create a reduceduser dictionary.

Thereby, the differences between the user dictionary and the reduceduser dictionary are determined as words of low likelihoods of beingincluded in the input speech, and in the speech recognition processing,a speech recognition result which is the same as the case of using boththe system dictionary and the user dictionary is acquired substantially.

Further, processing performed by the client 10 is processing todetermine whether the words of the user dictionary have a likelihood ofbeing included in the input speech. In this stage, it is only necessaryto be careful of not missing words which actually appear, and thisprocessing does not adversely affect the accuracy of the speechrecognition directly.

Further, the reduced user dictionary creation section (reduced userdictionary creation means) 13 creates a reduced user dictionary by meansof a word spotting method using the user dictionary 12.

Hereinafter, this will be described in detail. In FIG. 1, the client 10includes the speech input section 11, the user dictionary 12, thereduced dictionary creation section 13, and the client communicationsection 14, as described above. Further, the server 20 includes thesystem dictionary section 21, the speech recognition section 22, and theserver communication section 23. The client communication section 14which performs communications with the server 20 and the servercommunication section 23 which performs communications with the client10 are connected over the communication network 120.

In the client 10, the speech input section 11 may be configured of amicrophone and an A/D converter, for example. The user dictionarysection 12 is formed of a storage section such as a hard disk or anonvolatile memory, and has a mode of storing dictionary data. Thereduced dictionary creation section 13 is adapted to create a reduceduser dictionary from the user dictionary while referring to the inputspeech, and in the exemplary embodiment, is configured of amicroprocessor having a random access memory (RAM) and a centralprocessing unit (CPU) which executes computer programs stored in theRAM. The client communication section 14 performs data communicationsusing wired LAN, wireless LAN, or mobile telephones, for example.

The server 20 is formed of a personal computer or the like, for example.The system dictionary section 21 is formed of a hard disk storing adictionary used for speech recognition, for example. The servercommunication section 23 performs data communications with the client 10using a LAN and the like. The speech recognition section 22 performspredetermined speech recognition processing while referring to a systemdictionary in the system dictionary section 21. The communicationnetwork 120 is configured of wired LAN, wireless LAN or wirelessnetworks used by mobile telephones, for example.

Next, operation of the first exemplary embodiment will be describedbased on FIG. 2.

First, a user inputs a speech from the speech input section 11 of theclient 10 (step S101: speech input step). With the input, the reduceddictionary creation section 13 refers to the speech data input at stepS101, and creates a reduced user dictionary from the user dictionarysection 12 (step S102: reduced user dictionary creation step).

Specifically, the reduced user dictionary is a dictionary created byselecting words, having high likelihoods of being included in the inputspeech, from the words included in the user dictionary stored in theuser dictionary section 102, and has a characteristic as a partialdictionary of the user dictionary. That is, when a speech to berecognized is input, the reduced user dictionary is created as adictionary corresponding to the input speech based on the userdictionary of the user dictionary section 102. Although the reduced userdictionary includes partial words of the user dictionary, theinformation held by each word is the same as that of the userdictionary. The reduced user dictionary, created in such a manner, isstored in the reduced user dictionary section 13D.

Next, the client communication section 14 transmits the speech datainput at step S101 and the reduced user dictionary created at step S102to the server communication section 23 of the server 20 over thecommunication network 120 (step S103: transmission step).

Then, the server communication section 23 of the server 20 receives thespeech data and the reduced user dictionary transmitted from the client10 (step S104). The speech recognition section 22 of the server sideperforms speech recognition processing on the received speech data usingboth the system dictionary in the system dictionary section 21 and thereceived reduced user dictionary (step S105: speech recognition step).

Then, when speech recognition information regarding the input speechapplied with the speech recognition is sent back to the client 10, it isoutput to the outside from the client 10 (input speech output step). Inthat case, it is output and displayed by an image or a character displayto the outside from the recognition result output section 15, forexample.

Note that each of the steps 101 to 105 may be configured such that theexecution content is divided into the client 10 side and the server sideand is executable by a control program or a program for data processing,and may be executed by a computer previously provided to each side.

Next, the configuration of the reduced dictionary creation unit 13 willbe described with reference to FIG. 3.

The reduced dictionary creation section 13 includes a comparing section13A which compares the input speech and the words and calculates thelikelihood that the words appear in the input speech, a word temporarilystoring section 13B which temporarily stores sets of subject words andthe likelihood, and a word selection section 13C which refers to theword temporarily storing section 13B and selects one or a plurality ofwords having high likelihoods.

Next, operation of the reduced dictionary creation section 13 will bedescribed based on FIG. 4.

The reduced dictionary creation section 13 repeats the processing ofstep S202 and step S203 to the respective words included in the userdictionary 12 (step S201).

At step S202, the reduced dictionary creation section 13 calculates, inthe comparing section 13A, the likelihood that a target word is includedin the input speech (likelihood calculation step). At step S203, thereduced dictionary creation section 13 creates a reduced dictionary byassociating (pairing) the target word and the calculated likelihood andstores in the created word temporarily storing section 13B (wordtemporarily storing step).

When the above processing has been finished to all of the words includedin the user dictionary 12, the reduced dictionary creation section 13activates the word selection section 13C. The reduced dictionarycreation section 13 selects, by the word selection section 13C, wordshaving high likelihoods among the words stored in the word temporarilystoring section 13B (word selection step). The selected words are editedto be in a form of a dictionary, and a reduced user dictionary iscreated and stored in the reduced user dictionary section 13D (reduceddictionary creation step).

Note that the selection processing performed by the word selectionsection 13B can be executed in various ways. For example, the processingcan be performed by previously setting a fixed likelihood and selectingwords of this likelihood and higher while not selecting words of lowerlikelihoods.

Alternatively, the processing can be performed by previously setting afixed number, and selecting words of higher likelihoods in order withina range of not exceeding this number.

Needless to say, these ways may be combined, for example, such asselecting words of higher likelihoods in order within a range that thenumber of selected words does not exceed the predetermined number, andat the same time, not selecting words of lower likelihoods than apredetermined lowest likelihood.

In practice, the user dictionary 12 can be configured as dictionary datastored in a hard disk or a nonvolatile memory, for example. The wordtemporarily storing section 13B is configured as a data storing regionsecured in a hard disk, a nonvolatile memory, or a volatile memory.

The comparing section 13A and the word selection section 13C may beconfigured by executing a computer program stored on a memory by theCPU.

Further, the reduced user dictionary section 13D is in a form ofdictionary data stored in a hard disk or a memory, which is the same asthe case of the user dictionary section 12.

In the reduced user dictionary stored in the reduced user dictionarysection 13D, as the stored data is limited to the words selected by theword selection section 13C, it has a characteristic of a partialdictionary of the user dictionary.

The comparing section 13A can be in various embodiments. For example, amethod used for word spotting in a field of the speech recognition maybe directly applied and performed. Word spotting is a method of pickingup necessary words and syllables from an input speech, which isdescribed in “Report of Standard Technologies prepared by Japan PatentOffice” of 2001, Theme “Search Engine, C-6-(3) “Speech Search”, forexample.

In the first exemplary embodiment, it is only necessary to determine,with respect to each of the words in the user dictionary 12, whether theword can be picked up from the input speech (extraction availabilitydetermination step), and store the word in the word temporarily storingsection 13B together with the likelihood calculated at the time ofdetermination (reduced dictionary creation step).

These steps may be configured such that the contents thereof areprogrammed and executed by a computer having been provided to the clientside.

Referring to the “Report of Standard Technologies” mentioned above, onemethod of implementing word spotting uses DP (Dynamic Programming)matching. DP matching is a pattern matching technology for speechrecognition, in which time normalization is performed such that the samephoneme in words correspond to each other to thereby calculate aresemble distance between words. Here, it is assumed that there are twospeech waveforms with respect to one word, for example. These areassumed to be time-series patterns A and B, in which A is an inputspeech, and B is a standard pattern.

In the case of performing word spotting using DP matching, the standardpattern B of a spotting object word is shifted by one frame from thestarting end of the input speech A (parameter series such as spectrum)to thereby perform DP matching with a partial segment of the inputspeech.

When a distance as a matching result becomes a threshold or lower, it isdetermined that there is a standard pattern at that point.

In the first exemplary embodiment, it is not required to set a thresholdmentioned above. The first exemplary embodiment can be configured suchthat positive and negative of a distance value is inverted and output asa likelihood, regardless of the distance value. The reason why positiveand negative is inverted when the distance is converted to a likelihoodis that as the possibility of the word being included in the inputspeech is higher as the distance becomes shorter, the value is necessaryto be inverted in order to be used as a likelihood in which thepossibility of the word being included in the input speech becomeshigher as the value is larger.

Further, a method of performing word spotting using HMM (Hidden MarkovModel), instead of DP matching, is also well known. A method ofperforming word spotting using HMM is described in detail in “SpeechRecognition Based on Probability Models”, 2^(nd) edition, (by SciichiNAKAGAWA, published by the Institute of Electronics, Information andCommunication Engineers, 1989), Section 3, 3.4.2 “Phoneme/Syllable/WordSpotting Algorithm”.

As described in detail above, the comparing processing performed by thecomparing section 13A can be executed in various modes using well-knownart.

Next, specific operation of the entire first exemplary embodiment willbe described in detail using the examples of inputs in FIG. 5 and theflowcharts of FIGS. 2 and 4.

FIG. 5( a) shows an example of a user dictionary (contents) stored inthe user dictionary section 12. This user dictionary mainly storesJapanese writings and pronunciations of place names in New York City.

Now, it is assumed that a user speaks (inputs speech)“sheisutajiamuwadokodesuka” to the speech input section 11 of the client10 (step S101 in FIG. 2).

The reading corresponding to this phonation, when written in hiragana,is “sheisutajiamuwadokodesuka”. When the speech is input by the user,the reduced dictionary creation section 13 is immediately activated(step S102 in FIG. 2).

Referring to FIG. 4, the reduced dictionary creation section 13repeatedly performs processing of calculating, regarding each of thewords stored in the user dictionary 102, the likelihood that the word isincluded in the input speech, and stores in the word temporarily storingsection 13B (step S201: step S202 to step S203 in FIG. 4). In thisexample, first, a word “iisutobirejji” is selected as a word to becalculated for likelihood. The reduced dictionary creation section 13compares the word and the input speech, and calculates the likelihoodthat this word is included in the input speech. If the calculatedlikelihood is “0.2” for example, the reduced dictionary creation section13 stores the dictionary content of the word “iisutobirejji”, that is, aset of writing/pronunciation and the likelihood “0.2”, in the wordtemporarily storing section 13A.

Next, the target word is changed to the next word “kuroisutazu” in theuser dictionary, and likelihood calculation is performed in the samemanner. If the calculated likelihood is “0.1” for example, the reduceddictionary creation section 13 stores the dictionary content of the word“kuroisutazu”, that is, a set of writing/pronunciation and thelikelihood “0.1”, in the word temporarily storing section 13B. Thereduced dictionary creation section 13 repeatedly performs processing ofthis likelihood calculation and word storage to the word temporarilystoring section 13B, on all words in the user dictionary 12.

FIG. 5( b) shows an example of the contents of the word temporarilystoring section 13B at the time when the processing of likelihoodcalculation and word storage has been completed. In the word temporarilystoring section 13B, the calculated likelihood is stored while beingassociated with each of all words included in the user dictionary.

Next, the reduced dictionary creation section 13 selects, by the wordselection section 13C, words having high likelihoods from the wordtemporarily storing section 13B (step S204 in FIG. 4). In this example,it is assumed that the word selection section 13C is configured toselect words having the likelihood of “0.5” or higher, for example.Referring to the contents of FIG. 5( b), the corresponding words are“sheisutajiamu” (likelihood 0.8), “sheekusupiagaaden” (likelihood 0.6),and “meishiizu” (likelihood 0.5), so that these three words are selectedby the word selection section 13C.

Next, the reduced dictionary creation section 13 outputs the three wordsselected by the word selection section 13C, and creates a dictionaryconsisting of these three words (step S205 in FIG. 4). The dictionarycreated in this manner is a reduced user dictionary, and is stored inthe reduced user dictionary section 13D. FIG. 5( c) shows the storedcontents.

In FIG. 5( c), the reduced user dictionary consists of the threeselected words “sheisutajiamu, sheekusupiagaaden, meishiizu”, and thedictionary content of each word is configured as to be completely thesame as that of the user dictionary shown in FIG. 5( a).

In this way, the reduced user dictionary created by the client 10 istransmitted from the client communication section 14 over thecommunication network 120 to the server communication section 23 of theserver 20, together with the input speech data“sheisutajiamuwadokodesuka” (step S103 in FIG. 2).

When the server 20 receives the input speech data and the reduced userdictionary from the server communication section 23, the server 20performs speech recognition processing by the speech recognition section22 (step S105 in FIG. 2). In this speech recognition processing, boththe reduced user dictionary transmitted from the client 10 and thesystem dictionary in the server 20 side are used. FIG. 5( d) showsexemplary contents of the system dictionary stored in the systemdictionary section 21 of the server 20.

Referring to FIG. 5( d), in this example, the system dictionary section21 stores general words having high possibility of being used in anysituations including demonstratives such as “koko” and “soko”,independent auxiliary verbs such as “da” and “desu”, case particles suchas “ga”, “wo”, and “ni”, an adverbial particle “wa”, a final particle“ka”, common nouns “nippon” and “washinton”, and interjection “hai” and“iie”.

The speech recognition section 22 performs speech recognition processingon the input speech “sheisutajiamuwadokodesuka” using both the reduceduser dictionary and the system dictionary, and acquires a speechrecognition result “sheisutajiamu/wa/doko/desu/ka”. Here, the slash “/”is a sign inserted for explanatory purpose in order to indicateseparations in the recognized words.

In the speech recognition result “sheisutajiamu/wa/doko/desu/ka”, theleading word “sheisutajiamu” is a word derived from the reduced userdictionary, and the all of the following words “wa” “doko” “desu” “ka”are derived from the system dictionary. The words in the reduced userdictionary are originally stored in the user dictionary 12 of the client10.

As described above, in the first exemplary embodiment, even in the casewhere words in the user dictionary of the user dictionary section 12 ofthe client 10 side and words in the system dictionary of the systemdictionary section 21 of the server 20 side are combined, the speechrecognition result can be acquired. This is an advantage of the presentinvention over the conventional art.

Here, a general-purpose technique, in which the entire user dictionaryof the client is transferred to the server prior to the speechrecognition and is used together with the system dictionary in thespeech recognition processing, and the first exemplary embodiment of theinvention will be compared.

In the general-purpose technique, the entire user dictionary, that is,all ten words in the example of FIG. 5( a) have to be transmitted. Onthe other hand, in the first exemplary embodiment of the invention, itis only necessary to transmit data of three words stored in the reduceduser dictionary, as described above.

In general, the communication network 120 connecting the client 10 andthe server 20 usually has slower data transfer speed and takessignificantly higher cost for data transfer, compared with those of adata bus built in each of the client 10 and the server 20. In thissituation, it is very important to reduce the amount of data to betransferred, whereby it is possible to achieve an advantage of reducingthe time and cost for transfer which has not been achievedconventionally.

Further, even in the case where calculation resources usable in theclient 10 are few and accuracy of likelihood calculation by thecomparing section 13A of the reduced dictionary creation section 13 isnot high, the selection criteria in the word selection section 13C isset to be less strict such that a larger number of words can beselected.

With this configuration, the first exemplary embodiment of the inventioncan prevent deterioration in the accuracy of speech recognition, whichis a unique advantage (positive effect) of the first exemplaryembodiment.

This is because even if the selection section 13C selects words whichare finally unnecessary so that unnecessary words are included in thereduced user dictionary, it is expected that a correct result can beachieved in the speech recognition processing performed by the server 10unless the words included in the correct result are not missed. In sucha case, although the size of the reduced user dictionary becomes largeand the data transfer time and the cost are affected, the selectioncriteria of the selection section 13C may be set while consideringtrade-off with those effects.

The first exemplary embodiment is characterized in that only inputspeech is required in creating the reduced user dictionary.

On the other hand, in the general-purpose technique, it has beennecessary to narrow down the vocabulary to be transmitted from theclient to the server by using information other than speech such as IDof a form of an input destination.

In the first exemplary embodiment, no information other than inputspeech is necessary when creating the reduced user dictionary, asdescribed above. As the input speech is information which is to berequired inevitably in speech recognition processing, the firstexemplary embodiment is applicable to any situation of performing speechrecognition processing.

This aspect is a significant advantage of the present exemplaryembodiment, compared with the general-purpose technique which is notapplicable when there is no information other than speech data to beprocessed in speech recognition.

Note that in the exemplary embodiment, it is easy to determine theselection criteria of the word selection section 13C while consideringthe communication speed and communication cost of the communicationnetwork 120. For example, if the communication speed is low or thecommunication cost is high, it is easily adjustable to suppress themaximum number of words to be stored in the reduced user dictionary soas not to take time and cost exceeding a certain limit for transferringthe reduced user dictionary from the client 10 to the server 120. It isalso easy to have a configuration in which such an adjustment isdynamically performed each time speech is input.

As described above, the first exemplary embodiment has the followingadvantages.

That is, in the speech recognition processing performed by the server20, a speech recognition result can be obtained using substantially boththe system dictionary and the user dictionary at the same time.Specifically, as a user dictionary is installed in a client such as amobile terminal held by a user, the user registers necessary words inthe user dictionary. Although it is the best way to transmit the userdictionary to the server with the original capacity and perform speechrecognition using the user dictionary and the system dictionary, aproblem will be caused in the aspect of transmission capacity whenconsidering transmission of the dictionary.

As such, in the exemplary embodiment, words determined as unnecessaryfor recognizing an input speech are eliminated to thereby create areduced user dictionary by reducing the capacity of the user dictionary,which is transmitted to the server together with the data of the inputspeech. As such, it is possible to prevent the transmission capacityfrom the client to the server from being increased. Further, as thereduced user dictionary transmitted to the server includes the wordsnecessary for recognizing the input speech and the words are registeredby the user, the input speech can be recognized reliably by combiningthe reduced user dictionary and the system dictionary of the server.

As described above, in the exemplary embodiment, as the reduced userdictionary is created from the user dictionary, the reduced userdictionary is created by eliminating words determined as unnecessary forrecognizing the input speech, and recognition processing of the inputspeech using the reduced user dictionary and the system dictionary issubstantially the same as recognition processing of the input speechusing the user dictionary and the system dictionary. As such, the speechrecognition result can be obtained using substantially both the systemdictionary and the user dictionary at the same time, as described above.

Further, even in the case where information other than input speechcannot be used, the reduced user dictionary can be easily created onlywith the input speech, and as the amount of transfer becomessignificantly small compared with the case of transferring the userdictionary in the example of the general-purpose technique, the amountof data to be transferred between the client and the server can bereduced in a large amount. Further, even if resources usable in theclient are small, there is an advantage that an adverse effect on theaccuracy of the speech recognition is small in the entire system.

As the first exemplary embodiment of the invention is configured andworks as described above, when speech recognition processing isperformed by the speech recognition device as described above, the inputspeech and the reduced user dictionary are transmitted from the speechinput device. As such, on the speech recognition device side, speechrecognition can be performed on the input speech while maintaining thequality of the speech recognition based on the reduced user dictionaryand the system dictionary. Further, as the reduced user dictionaryhaving a smaller capacity is transmitted, instead of the userdictionary, from the speech input device, the amount of data to betransmitted to the speech recognition device and the communication costcan be reduced significantly compared with the case of transmitting theentire user dictionary. As such, the data transmission time and the timefor processing speech recognition in the speech recognition device canbe reduced significantly, whereby the speech recognition can beperformed rapidly without increasing the burden on the system whilemaintaining, the quality of speech recognition.

Second Exemplary Embodiment

Next, a second exemplary embodiment of the invention will be describedwith reference to FIGS. 6 and 7.

The same components as those of the first exemplary embodiment aredenoted by the same reference numerals.

In FIGS. 6 and 7, a speech recognition system of the second exemplaryembodiment is configured of a client terminal device (hereinafterreferred to as “client”) 60 working as a speech input device and aserver 70 working as a speech recognition device.

The client (client terminal device) 60 includes a speech input section61, a data processing section 62, a storage section 63, a clientcommunication section 64, a reduced dictionary creation program 65, anda recognition result output section 69, as shown in FIG. 6.

The storage section 63 stores a user dictionary 63 a as data. The dataprocessing section 62 reads the reduced dictionary creation program 65and controls data processing (creation of reduced dictionary).

The data processing section 62 performs the same processing as thatperformed by the reduced dictionary creation section 13 of the firstexemplary embodiment, in accordance with the reduced dictionary creationprogram 65. Specifically, the data processing section 62 refers to thespeech input to the speech input section 61 of the client (clientterminal device) 60, and creates a reduced user dictionary by selectingsome words having high possibility of being included in the speech, fromthe user dictionary 63 a in the storage section 63. The reduced userdictionary created by the client 60 is transmitted by the clientcommunication section 64 to the server (speech recognition device) 70over the communication network 120. A reference numeral 69 indicates arecognition result output section which outputs and displays arecognition result with respect to the input speech transmitted from theserver 70.

Further, the server 70 working as a speech recognition device includes aserver communication section 71, a data processing section 72, a storagesection 73, and a speech recognition program 75, as shown in FIG. 7. Thestorage section 73 stores a system dictionary 73 a as data. The dataprocessing section 72 reads the speech recognition program 75 andcontrols data processing.

The data processing section 72 performs the same processing as thatperformed by the speech recognition section 22 of the first exemplaryembodiment, in accordance with the speech recognition program 75.

Specifically, the data processing section 72 first receives the inputspeech data and the reduced user dictionary transmitted from the client60 via the server communication section 71, and then performs speechrecognition processing on the input speech data using both the systemdictionary 73 a in the storage section 73 and the reduced userdictionary.

In the second exemplary embodiment, the client 60 and the server 70 canbe realized with any electronic devices having CPU and memories andcapable of being connected over networks, such as personal computers(PC), PDA (Personal Digital Assistant), and mobile telephones. Further,if a computer has a general-purpose speech input function, it can bedirectly used as the input section 61 of the client 60.

The functions of the other sections or other configurations and theiroperational effects are the same as those of the first exemplaryembodiment.

As described above, the second exemplary embodiment has the followingadvantages.

First, in the speech recognition processing performed by the server 70,a speech recognition result which is the same as the case ofsubstantially using the system dictionary and the user dictionary can beobtained, which is the same as the first exemplary embodiment. Thisprovides an advantage that the amount of data to be transferred betweenthe client 60 and the server 70 is small, even when information otherthan the input speech cannot be used.

Further, even in the case where resources usable in the client 60 arefew, there is an advantage that an adverse effect to the accuracy of thespeech recognition is small in the entire system.

As described above, in the respective exemplary embodiments, the clientterminal device (client) which is a speech input device extracts wordsfrom the user dictionary with reference to the input speech and createsthe reduced user dictionary. Extraction of words is performed asprocessing of determining possibilities that the words in the userdictionary are included in the input speech, and extracting those ofhigh possibilities. Then, the input speech and the reduced userdictionary are transmitted from the speech input device (client) to thespeech recognition device (server). On the other hand, on the speechrecognition device (server) side, speech recognition processing isperformed using the system dictionary and the reduced user dictionary atthe same time. As the differences between the user dictionary of theclient and the reduced user dictionary are only words having lowpossibilities of being included in the input speech, in the speechrecognition processing by the server, the same speech recognition resultas that of the case of using the system dictionary and the userdictionary at the same time is obtained substantially.

Further, as it is expected that the size of the reduced user dictionaryis significantly smaller than the size of the user dictionary, theamount of data to be transferred between the speech input device and thespeech recognition device can be reduced reliably than the case oftransmitting the entire user dictionary. Further, regardingcommunications between the speech input device and the speechrecognition device, the transfer speed is lower and the communicationcost is higher usually, compared with data transfer inside a server or aclient. As such, reduction of the amount of transferred data providesreduction of the data transfer time, improvements in responses of speechrecognition processing, and reduction of the communication costs.

Further, when the speech input device (client) determines whether thewords in the user dictionary have possibilities of being included in theinput speech, it is only necessary to be careful not to miss the wordsactually included therein at this stage, and even if unnecessary wordsare included in the reduced user dictionary, it does not affect theaccuracy of the final speech recognition, because it is expected thatunnecessary words are not finally adopted in the speech recognitionprocessing. As such, even in the case where resources usable in thespeech input device (client) are few so that processing cannot beperformed with high accuracy, the accuracy of the speech recognitionwill not be adversely affected directly. In other words, it is easy toinstall the functions of the present invention to a speech input device(client) having few resources such as a CPU or a memory.

As such, according to the exemplary embodiments of the invention, inaddition to the fact that speech recognition processing by the speechrecognition device can obtain a speech recognition result which issubstantially the same as the case of using both the system dictionaryand the user dictionary, even in the case where information other thanthe input speech cannot be used, the amount of data to be transferredbetween the speech input device and the speech recognition device isfew. Further, even in the case where resources usable in the speechinput device are few, an adverse effect on the accuracy of speechrecognition can be small in the entire system.

A speech recognition system according to another exemplary embodiment ofthe invention is a speech recognition system in which a speech inputdevice, which converts speech into electric signals and input thesignals as input speech, and a speech recognition device, which takes inthe input speech and processes to recognize the input speech, areconnected in a communicable manner. The speech recognition system may beconfigured such that the speech input device includes a user dictionarysection which stores words to be used for recognizing the input speech,and a reduced user dictionary creation unit which extracts wordscorresponding to the input speech from the user dictionary section andcreates a reduced user dictionary, and that the speech recognitiondevice includes a speech recognition unit which receives the inputspeech and the reduced user dictionary from the speech input/outputdevice and recognizes the input speech based on the reduced userdictionary and a system dictionary, provided beforehand, storing wordsfor speech recognition.

With this configuration, in speech recognition processing to beperformed by the speech recognition device, the input speech and thereduced user dictionary are transmitted from the speech input device. Assuch, the speech recognition device is capable of performing speechrecognition on the input speech while maintaining the quality of speechrecognition based on the reduced user dictionary and the systemdictionary. Further, as the reduced user dictionary having smaller datacapacity is transmitted from the speech input device instead of the userdictionary, the amount of data to be transferred to the speechrecognition device and the communication costs can be reducedsignificantly, compared with the case of transmitting the entire userdictionary. In this aspect, the time for data transmission andprocessing time for speech recognition in the speech recognition devicecan be reduced significantly.

A speech recognition system according to another exemplary embodiment ofthe invention is a speech recognition system in which a speech inputdevice, which converts speech into electric signals and inputs thesignals as input speech, and a speech recognition device, whichprocesses to recognize the input speech, are connected in a communicablemanner. The speech input/output device includes a speech input sectionwhich inputs speech, a user dictionary section which stores words to beused for recognizing the input speech, a reduced user dictionarycreation section which extracts words corresponding to the input speechfrom the user dictionary and creates a reduced user dictionary, and atransmission unit which transmits the input speech and the reduced userdictionary to the speech recognition device. Further, the speechrecognition device may be configured as to include a system dictionarysection which stores words for speech recognition, a reception unitwhich receives the input speech and the reduced user dictionarytransmitted from the speech input device, and a speech recognitionsection which performs speech recognition processing on the input speechusing the system dictionary and the reduced user dictionary.

In this speech recognition system, as speech recognition processing bythe speech recognition device can be performed based on the reduced userdictionary and the system dictionary, a speech recognition result whichis the same as the case of using both the user dictionary and the systemdictionary can be obtained substantially.

Further, the amount of data to be transferred from the speech inputdevice to the speech recognition device and the communication costs canbe reduced significantly, compared with the case of transmitting theentire user dictionary. In this aspect, there is an advantage that theload on the network can be reduced reliably, and the processing time forspeech recognition can be reduced as a whole. Further, the amount ofdata to be transferred from the speech input device to the speechrecognition device and the communication costs can be reducedsignificantly, compared with the case of transmitting the entire userdictionary.

Note that the reduced user dictionary described above is a dictionary inwhich words having possibilities of being included in the input speechare selected from the words in the user dictionary. Further, the reduceduser dictionary creation unit may be configured to compare the words inthe user dictionary and the input speech, calculate the likelihood thatthe words appear in the input speech, and based on the calculationresult, select the words of high likelihoods and create the reduced userdictionary.

Thereby, as the differences between the user dictionary and the reduceduser dictionary are words having low possibilities of being included inthe input speech, a speech recognition result which is the same as thecase of using both the system dictionary and the user dictionary can beobtained substantially in the speech recognition processing. Further, asthe processing performed in the speech input device side is to determinewhether the words in the user dictionary have possibilities of beingincluded in the input speech, it is only necessary to be careful of notmissing words actually included in the input speech in this stage. Assuch, there is an advantage of not adversely affecting the accuracy ofthe speech recognition directly as the general-purpose technique.

Further, the reduced user dictionary creation unit may be configured tocreate the reduced user dictionary by the word spotting method using theuser dictionary.

Thereby, it is possible to apply the word spotting method used forspeech recognition to the creation of the reduced user dictionary tothereby create an effective reduced user dictionary.

Further, the reduced user dictionary creation unit may includes acomparing section which compares the input speech and the words in theuser dictionary and counts the likelihood of the respective wordsincluded in the input speech, a word temporarily storing section whichtemporarily stores sets of the respective counted words and thecorresponding likelihood, and a word selection section which selects oneor a plurality of words of high usage from the word temporarily storingsection and create a reduced user dictionary.

A speech recognition method according to another exemplary embodiment ofthe invention may include converting speech into electric signals andinput the signals as input speech by a speech input device, extractingwords relating to the input speech from a user dictionary for speechrecognition provided to the speech input device and creating a reduceduser dictionary, transmitting the input speech and the reduced userdictionary from the speech input device to a speech recognition device,and performing speech recognition processing on the input speech,operated in the speech recognition device receiving the input speech andthe reduced user dictionary, based on a system dictionary for speechrecognition provided to the speech recognition device and the reduceduser dictionary.

A speech recognition method according to another exemplary embodiment ofthe invention may include converting speech into electric signals andinputting the signals as input speech by a speech input device,extracting words relating to the input speech from a user dictionary forspeech recognition provided to the speech input device and creating areduced user dictionary, transmitting the input speech and the reduceduser dictionary from the speech input device to a speech recognitiondevice, receiving by the speech recognition device which received theinput speech and the reduced user dictionary, and operating the speechrecognition device to perform speech recognition processing on the inputspeech based on a system dictionary for speech recognition provided tothe speech recognition device and the reduced user dictionary.

As such, in the speech recognition processing performed in the speechrecognition device, a speech recognition result which is the same as thecase of using both the system dictionary and the user dictionary can beobtained substantially. Further, the amount of data to be transferredfrom the speech input device to the speech recognition device and thecommunication costs can be reduced, compared with the case oftransmitting the entire user dictionary.

Note that when creating the reduced user dictionary, it is acceptable tocompare the words in the user dictionary and the input speech, calculatethe likelihood that the words appear in the input speech, select thewords of high likelihoods based on the calculation result, and createthe reduced user dictionary.

Further, when creating the reduced user dictionary, it is acceptable tocreate the reduced user dictionary from the user dictionary by the wordspotting.

Further, when creating the reduced user dictionary, it is acceptable tocompare the input speech and the words in the user dictionary, count thelikelihood of the respective words appearing in the input speech,temporarily hold sets of the respective counted words and thecorresponding likelihood, select one or a plurality of words of highlikelihoods from the temporarily stored words and create a reduced userdictionary, and edit the selected words to be in a form of a dictionaryto thereby create a reduced user dictionary.

A speech recognition program according to another exemplary embodimentof the invention may, in a speech recognition system in which a speechinput device which converts speech into electric signals and input thesignals as input speech and a speech recognition device which takes inthe input speech input to the speech input device and appliesrecognition processing are connected in a communicable manner, cause acomputer to perform a speech input controlling function to convert thespeech received by the speech input device into electric signals andinput the signals as input speech, a reduced user dictionary creationcontrolling function to extract words relating to the input speech froma user dictionary for speech recognition provided to the speech inputdevice and create a reduced user dictionary, and a transmissioncontrolling function to transmit the input speech and the reduced userdictionary from the speech input device to the speech recognition devicefor speech recognition processing.

A speech recognition processing program according to another exemplaryembodiment of the invention may, in a speech recognition system in whicha speech input device which converts speech into electric signals andinput the signals as input speech and displays the recognition resultand a speech recognition device which takes in the input speech input tothe speech input device and applies recognition processing and sendsback the speech to the speech input device are connected in acommunicable manner, cause a computer which is provided to the speechrecognition device to execute a recognition object reception processingfunction to receive the input speech transmitted from the speech inputdevice and a reduced user dictionary according to a user dictionary ofthe speech input device side, and a speech recognition processingfunction to perform speech recognition processing on the received inputspeech based on the system dictionary for speech recognition provided tothe speech recognition device and the received reduced user dictionary.

Even with this configuration, as speech recognition processing can beperformed rapidly as the case of the respective systems, and even byusing a reduced user dictionary, it is possible to obtain a speechrecognition result which is substantially the same as the case of usingthe user dictionary, and there is an advantage that the amount of datato be transmitted from the speech input device to the speech recognitiondevice and the communication costs can be reduced significantly comparedwith the case of transmitting the entire user dictionary, and the entireprocessing time for speech recognition can be reduced.

Note that a configuration to cause a computer to perform a function ofcreating the reduced user dictionary by comparing the words in the userdictionary and the input speech, calculating the likelihoods that thewords appear in the input speech, and selecting the words of highlikelihoods based on the calculation result is also acceptable.

Further, a configuration to cause a computer to perform a function ofcreating the reduced user dictionary by comparing the input speech andthe words in the user dictionary, counting the likelihood of therespective words appearing in the input speech, temporarily holding setsof the respective counted words and the corresponding likelihoods, andselecting one or a plurality of words of high likelihoods from thetemporarily stored words is also acceptable.

Further, a configuration of creating the reduced user dictionary fromthe user dictionary by the word spotting is also acceptable.

Thereby, in the speech recognition processing performed by the speechrecognition device, it is possible to substantially obtain a speechrecognition result which is the same as the case of using both thesystem dictionary and the user dictionary. Further, as processing by thespeech input device is processing for determining whether the words inthe user dictionary have possibilities of being included in the inputspeech, it is only necessary to be careful of not missing words actuallyappearing therein in this stage, so that the accuracy of the speechrecognition is not adversely affected directly.

While the present invention has been described with reference to theembodiments (and examples), the present invention is not limited tothese embodiments (and examples). Various changes in form and detailswhich can be understood by those skilled in the art may be made withinthe scope of the present invention.

This application is the National Phase of PCT/JP2008/054705, filed Mar.14, 2008, which is based upon and claims the benefit of priority fromJapanese patent application No. 2007-065229, filed on Mar. 14, 2007, thedisclosure of which is incorporated herein in its entirety by reference.

INDUSTRIAL APPLICABILITY

In the above-described embodiments of the present invention, the presentinvention is applicable to a speech recognition system having aconfiguration in which speech is input to a client and speechrecognition is performed in a server connected to the client over acommunication network. Further, as a client, a wide variety of terminaldevices can be used regardless of their size and mode, including PCs orcar navigation terminals connected over networks, let alone mobileterminals such as PDA and motile telephones.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing the configuration of a speechrecognition system according to a first exemplary embodiment of theinvention.

FIG. 2 is a flowchart showing the operation of the speech recognitionsystem disclosed in FIG. 1.

FIG. 3 is a block diagram showing the configuration of a reduceddictionary creation section of the speech recognition system disclosedin FIG. 1

FIG. 4 is a flowchart showing the operation of the reduced dictionarycreation section disclosed in FIG. 3.

FIG. 5 is a table showing examples of a user dictionary of the clientdisclosed in FIG. 1 and a system dictionary of a server.

FIG. 6 is a block diagram showing the configuration of a client of aspeech recognition system according to a second exemplary embodiment ofthe invention.

FIG. 7 is a block diagram showing the configuration of a server part ofthe speech recognition system disclosed in FIG. 6.

FIG. 8 is a block diagram showing the configuration of a general-purposespeech recognition system.

FIG. 9 is a block diagram showing the configuration of anothergeneral-purpose speech recognition system.

DESCRIPTION OF REFERENCE NUMERALS

-   10, 60 client (client terminal device) as speech input device-   11, 61 speech input section-   12 user dictionary section-   13 reduced dictionary creation section (reduced dictionary creation    unit)-   13A comparing section-   13B word temporarily storing section-   13C word selection section-   13D reduced user dictionary section-   14 client communication section (transmission unit, reception unit)-   15, 69 recognition result output section-   20, 70 server as speech recognition device-   21 system dictionary-   22 speech recognition section (speech recognition unit)-   23 server communication section (transmission unit, reception unit)-   62 data processing section (creation of reduced user dictionary)-   72 data processing section (speech recognition processing)-   73 storage section (storage section of user dictionary)-   73 a system dictionary-   75 speech recognition program-   120 communication network

What is claimed is:
 1. A speech recognition system, comprising: a speech input device configured to input a speech as an input speech by converting the speech to an electric signal; and a speech recognition device configured to take in and recognize the speech input to the speech input device, wherein the speech input device and the speech recognition device are communicatively connected to each other, wherein the speech input device comprises: a user dictionary section storing words for use with recognition of the input speech; and a reduced user dictionary creation unit configured to create a reduced user dictionary by extracting words corresponding to the input speech from the user dictionary section, and wherein the speech recognition device comprises: a speech recognition unit configured to input the input speech and the reduced user dictionary from the speech input device and recognize the input speech based on the reduced user dictionary and a built-in system dictionary storing words for speech recognition, wherein the reduced user dictionary creation unit includes: a comparing section which compares the input speech and a word in the user dictionary and compiles a likelihood of each word appearing in the input speech, a word temporarily storing section which temporarily stores a set of each word and a corresponding likelihood compiled, and a word selection section which selects one or a plurality of words having high usage from the word temporarily storing section to thereby create the reduced user dictionary.
 2. The speech recognition system as claimed in claim 1, wherein the reduced user dictionary is a dictionary in which a word having a possibility of being included in the input speech is selected from the words in the user dictionary.
 3. The speech recognition system as claimed in claim 2, wherein the reduced user dictionary creation unit compares a word in the user dictionary and the input speech, calculates a likelihood that the word appears in the input speech, and based on a calculation result, selects a word having a high likelihood to thereby create the reduced user dictionary.
 4. The speech recognition system as claimed in claim 2, wherein the reduced user dictionary creation unit creates the reduced user dictionary by means of a word spotting method using the user dictionary.
 5. The speech recognition system as claimed in claim 1, wherein the reduced user dictionary creation unit compares a word in the user dictionary and the input speech, calculates a likelihood that the word appears in the input speech, and based on a calculation result, selects a word having a high likelihood to thereby create the reduced user dictionary.
 6. The speech recognition system as claimed in claim 1, wherein the reduced user dictionary creation unit creates the reduced user dictionary by means of a word spotting method using the user dictionary.
 7. A speech recognition system, comprising: a speech input device configured to input a speech as an input speech by converting the speech to an electric signal; and a speech recognition device configured to take in and recognize the input speech input to the speech input device, wherein the speech input device and the speech recognition device are communicatively connected to each other, wherein the speech input device comprises: a speech input section for inputting a speech; a user dictionary section storing words for use with recognition of the input speech; a reduced user dictionary creation unit configured to create a reduced user dictionary by extracting words corresponding to the input speech from the user dictionary; and a transmission unit configured to transmit the input speech and the reduced user dictionary to the speech recognition device, and wherein the speech recognition device comprises: a system dictionary section storing words for speech recognition; a receiving unit configured to receive the input speech and the reduced user dictionary transmitted from the speech input device; and a speech recognition unit configured to perform speech recognition on the input speech by using the system dictionary and the reduced user dictionary, wherein the reduced user dictionary creation unit includes: a comparing section which compares the input speech and a word in the user dictionary and compiles a likelihood of each word appearing in the input speech, a word temporarily storing section which temporarily stores a set of each word and a corresponding likelihood compiled, and a word selection section which selects one or a plurality of words having high usage from the word temporarily storing section to thereby create the reduced user dictionary.
 8. The speech recognition system as claimed in claim 7, wherein the reduced user dictionary is a dictionary in which a word having a possibility of being included in the input speech is selected from the words in the user dictionary.
 9. The speech recognition system as claimed in claim 7, wherein the reduced user dictionary creation unit compares a word in the user dictionary and the input speech, calculates a likelihood that the word appears in the input speech, and based on a calculation result, selects a word having a high likelihood to thereby create the reduced user dictionary.
 10. The speech recognition system as claimed in claim 7, wherein the reduced user dictionary creation unit creates the reduced user dictionary by means of a word spotting method using the user dictionary.
 11. A speech recognition method, comprising: converting a speech into an electric signal and inputting the converted speech as an input speech into a speech input device; extracting words related to the input speech from a user dictionary for speech recognition included in the speech input device to create a reduced user dictionary; transmitting the input speech and the reduced user dictionary from the speech input device to a speech recognition device; and at the speech recognition device having received the input speech and the reduced user dictionary, performing speech recognition on the input speech based on a system dictionary for speech recognition included in the speech recognition device and the received reduced user dictionary, wherein the extracting includes: comparing the input speech and a word in the user dictionary and compiling a likelihood of each word appearing in the input speech, temporarily storing a set of each word and a corresponding likelihood compiled, and selecting one or a plurality of words having high usage to thereby create the reduced user dictionary.
 12. The speech recognition method as claimed in claim 11, wherein the reduced user dictionary is created by: comparing words included in the user dictionary with the input speech, calculating likelihood of each word being present in the input speech, and selecting a word having high likelihood of being present in the input speech based on a result of the calculation.
 13. The speech recognition method as claimed in claim 11, wherein the reduced user dictionary is created from the user dictionary by word spotting.
 14. The speech recognition method as claimed in claim 11, wherein the reduced user dictionary is created by: comparing the input speech with words in the user dictionary, calculating likelihood of each word being present in the input speech, temporarily storing a set of the calculated likelihood and a corresponding word, selecting one or more words having high likelihood from temporarily-stored words, and editing the selected words to be in a form of a dictionary.
 15. A speech recognition method, comprising: converting a speech into an electric signal and inputting the converted speech as an input speech into a speech input device; extracting words related to the input speech from a user dictionary for speech recognition included in the speech input device to create a reduced user dictionary; transmitting the input speech and the reduced user dictionary from the speech input device to a speech recognition device; receiving at the speech recognition device the input speech and the reduced user dictionary; and at the speech recognition device, performing speech recognition on the input speech based on a system dictionary for speech recognition included in the speech recognition device and the received reduced user dictionary, wherein the extracting includes: comparing the input speech and a word in the user dictionary and compiling a likelihood of each word appearing in the input speech, temporarily storing a set of each word and a corresponding likelihood compiled, and selecting one or a plurality of words having high usage to thereby create the reduced user dictionary.
 16. The speech recognition method as claimed in claim 15, wherein the reduced user dictionary is created by: comparing words included in the user dictionary with the input speech, calculating likelihood of each word being present in the input speech, and selecting a word having high likelihood of being present in the input speech based on a result of the calculation.
 17. The speech recognition method as claimed in claim 15, wherein the reduced user dictionary is created from the user dictionary by word spotting.
 18. The speech recognition method as claimed in claim 15, wherein the reduced user dictionary is created by: comparing the input speech with words in the user dictionary, calculating likelihood of each word being present in the input speech, temporarily storing a set of the calculated likelihood and a corresponding word, selecting one or more words having high likelihood from temporarily-stored words, and editing the selected words to be in a form of a dictionary.
 19. A non-transitory computer readable storage medium storing a speech recognition program, in a speech recognition system including a speech input device configured to input a speech as an input speech by converting the speech to an electric signal and a speech recognition device configured to take in and recognize the speech input to the speech input device, for causing a computer included in the speech input device to execute: converting speech into an electric signal and inputting the converted speech as input speech into the speech input device; extracting words related to the input speech from a user dictionary for speech recognition included in the speech input device to create a reduced user dictionary; and transmitting the input speech and the reduced user dictionary from the speech input device to a speech recognition device for performing speech recognition, wherein the extracting includes: comparing the input speech and a word in the user dictionary and compiling a likelihood of each word appearing in the input speech, temporarily storing a set of each word and a corresponding likelihood compiled, and selecting one or a plurality of words having high usage to thereby create the reduced user dictionary.
 20. The non-transitory computer readable storage medium storing a speech recognition program as claimed in claim 19, wherein the reduced user dictionary is created by: comparing words included in the user dictionary with the input speech, calculating likelihood of each word being present in the input speech, and selecting a word having high likelihood of being present in the input speech based on a result of the calculation.
 21. The non-transitory computer readable storage medium storing a speech recognition program as claimed in claim 19, wherein the reduced user dictionary is created by: comparing the input speech with words in the user dictionary, compiling likelihood of each word being present in the input speech, temporarily storing a set of the compiled likelihood and a corresponding word, and selecting one or more words having high likelihood from temporarily-stored words.
 22. The non-transitory computer readable storage medium storing a speech recognition program as claimed in claim 19, wherein the reduced user dictionary is created from the user dictionary by word spotting.
 23. A speech recognition system, comprising: speech input means for inputting a speech as an input speech by converting the speech to an electric signal; and speech recognition means for taking in and recognizing the speech input to the speech input device, wherein the speech input means and the speech recognition means are communicatively connected to each other, wherein the speech input means comprises: user dictionary means for storing words for use with recognition of the input speech; and reduced user dictionary creation means for creating a reduced user dictionary by extracting words corresponding to the input speech from the user dictionary section, and wherein the speech recognition means comprises: speech recognition means for inputting the input speech and the reduced user dictionary from the speech input means and recognizing the input speech based on the reduced user dictionary and a built-in system dictionary storing words for speech recognition, wherein the reduced user dictionary creation means includes: a comparing section which compares the input speech and a word in the user dictionary and compiles a likelihood of each word appearing in the input speech, a word temporarily storing section which temporarily stores a set of each word and a corresponding likelihood compiled, and a word selection section which selects one or a plurality of words having high usage from the word temporarily storing section to thereby create the reduced user dictionary. 